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SUMMARY 


Approximately  3500  ccssaon  one-  and  two-syllable  words  were  transcribed 
phonetically.  The  sequences  of  speech  sounds  were  analyzed  in  terms  of  (a) 
the  probability  of  each  sound,  (b)  the  conditional  probability  of  each  pair 
of  sounds,  and  (c)  the  Joint  probability  of  each  digram.  The  maximum  infor- 
mation (H)  per  symbol  in  an  alphabet  of  4l  symbols  (the  number  treated) 
would  be  5.35  bits  (Fq).  In  the  sample  studied,  the  obtained  values  were 
4, 15-5.04  bits  (?i)  and  3.35-4.21  bits  per  symbol  (£2).  Souse  transitional 
probabilities  reached  O.33,  Among  words  of  a particular  length  in  syllables, 
the  words  of  few  sounds  contained  the  highe  2*t  1 OTSSm  OjI  value  per  symbol. 

Digrams  of  greatest  conditional,  and  Joint  probabilities  are  enumerated 
in  tables. 


INTRODUCTION 

Knowledge  of  the  statistics  of  the  English  language  is  limited  and  the 
seepe  of  the  topic  forbidding.  Inroads  are  exemplifed  by  the  works  of  -Thorn- 
difee  (l§, 19,20),  Dewey  (6),  and  Pratt  (l4),  Thorndike's  concern  vss  the 
frequency  of  usage  of  the  word  in  printed  language  of  "eesnsba  knowledge”. 
Other-  tabulations  cf  words  are  aval  Table,  most  of  them  word  counts  of  the 
language  .of  special  groups,  particularly  of  age  levels  (1,4, 5, 7, 8 ,9).  This 
approach  to  language,  aside  from  an  applied  connotation,  focuses  upon  the 
probability  of  a word. 

Dewey  added  phonetic  tabulations,  or  "sound  counts,"  and  syllable  tabu- 
lations to  an  enumeration  of  the  frequency  of  common  words.  His  work  formed 
the  basis  for  the  development  of  a system  of  shorthand  - As  Morse  cede 
devotes  minimum  space  to  the  frequent  letter  e,  so  efficient  transcribing  of 
acouutic  language  by  shorthand  is  facilitated  when  simple  characters  repre- 
sent the  most  frequent  units  of  speech.  Voelker  (21,22,23,24,25)  add’d 
relevant  studies  that  were  lee-s  extensive  than  Dewey's.  These  investigations 
permit  statements  of  the  probability  of  speech  sounds. 

Pratt,  in  the  context  of  cryptography,  emphasized  the  aspect  of  prob- 
ability with  respect  to  letters  in  printed  English.  In  addition  to  the 
frequency  of  single  letters  and  their  occurrence  in  initial  and  terminal 
positions  in  words,  he  dete mined  the  relative  frequency  of  digrams  and 
trigrams  and  enumerated  the  more  frequent  trigramsji 


1 Pratt  used  bigram  to  denote  a pair  of  letters.  Dipram  is  in  current  use. 
Digram  may  seem  to  have  appropriateness  for  writtenianguage,  not  spoken 
language.  However,  the  word  has  been  accepted  by  electrical  engineers  in 
reference  to  pulses,  etc.  Since  the  meani.-'g  of  digram  has  already  been  "ex- 
tended",  since  diphthong  is  not  available  for  the  present  meaning  and  a new 
word  such  as  bipht-hong  would  be  cumbersome , digram  is  used  here  to  mean  two 
sounds. 


Among  other  approaches  to  the  statistics  of  language  , Newman  (13)  ap- 
plied autocorrelation  techniques  to  the  vowel-consonant  sequences  sad  Lotz 
(ll)  and  Menserath  (12)  developed  graphical  representations  of  "frequency  of 
occurrence".  Information  theory  (17.26)  provides  another  technique  for 
quantifying  language.  This  methodology  includes  (a)  a unit  of  measurement , 
the  bit,  (b)  a point  of  view  that  is  intuitively  valuable,  "the  reduction  of 
-uncertainty ",  and  (c)  a mathematical  treatment.  The  receiver  or  listener  is 
viewed  as  being  in  a state  of  uncertainty  abcut  which  symbol  of  a prescribed 
set  he  is  to  receive.  If  the  symbols  (sounds,  letters,  words,  etc.)  contri- 
bute equally  to  reducing  this  uncertainty,  the  symbols  will  be  operating  at 
the  limit  of  their  possibilities  insofar  as  information  transmission  is  con- 
cerned. This  high  efficiency  of  information  transmitted  per  symbol  could 
obtain  only  with  Independence  among  the  symbols,  i.e.,  with  no  "intersymboi 
influence",  and  if  the  symbols  did  occur  with  equal  probability  there  would 
be  no  intersymbol  influence-  A circumstance  in  which  each  symbol  of  the 
collection  of  symbols  has  an  equal  opportunity  of  occurring  next  doss  not 
obtain  in  any  aspecr.  of  English:  letters  are  not  equally  frequent,  sounds 

are  not  equally  probable,  scsae  words  do  not  ordinarily  follow  other  words, 
etc.  Information  theory  assesses  information  per  symbol  as  the  amount  by 
which  uncertainty  is  reduced  on  the  average  on  receiving  une  symbol.  The 
quantitative  difference  between  the  maximum  information  per  symbol  and  the 
actual  information  per  symbol,  divided  by  the  maximum  information  per  symbol, 
is  called  redundancy.  With  no  redundancy  in  the  language,  each  symbol  would 
occur  with  equal  probability*  and  aa  the  intersyasbol  influence  increases  or 
the  probability  of  one  symbol  approaches  unity,  the  redundancy  of  the  langu- 
age approaches  a maximum  value  of  unity. 


PROBliM 


The  present  study  i=  = "finger  exercise"  in  applying  elementary  tech- 
niques of  information  theory  to  phonetic  prooebility.  One-  and  two-syllable 
isolated  words  were  sampled  with  respect  to  (*)  *he  relative  frequency  of 
speech  sounds,  j;(i^  (to  be  read  "the  probability  of  .sound  i"),  (b)  the  rela- 
tive frequency  of  occurrence  of  the  sounds  at  different  positions  in  the 
word . fcl  the  probability  of  two  sounds  occurring  in  succession,  j)(i,  j ) (to 
be  read  "the  joint,  probability  of  a sequence  of  sounds  designated  i and  j" ) ,, 
(d)  the  probability  that  one  sound  follows  another,  (J,)  ^read  "the  con- 
ditional probability  that  sound  follows  sound  i"),  and~\e)  probability  that 
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sound  i_  precedes  sound  j" ) . 2 Words  of  ons  and  two  syllables  and  of  differing 
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At  first  thought,  it  may  appear  that  values  d and  e should  be  identical. 

An  analogy  will  make  the  disparity  clear.  In  ordinary  spelling,  g1(,;t) 
equals  unity  when  the  i-syabol  is  the  letter  £ and  the  j-Bymboj,  ier-tae 
letter  u,  for  £ is  always  followed  by  the  letter  u,  as  in  quay,'  quiet,  etc. 
However,  as  one  "revei*ses  his  .field"  and.  starts  backward  from  the  let  be- 
u,  he  finds  that  many  letters  may  precede  it,  ae  lute , rule,  etc.  Thus, 
2j(i)  <1  vhm  the  letter  u is  the  j -symbol. 
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numbers  of  sounds  were  treated  separately.  The  principal  objective  was  to 
estimate  the  Jnfonaation  of  sounds  and  digrams.  This  would  lead  to  an  esti- 
mate of  the  redundancy  that  might  b«  ascribed  to  the  phonetic  structure  of 
words.  A secondary  objective  was  enumerative,  to  find  the  probability  of 
the  sounds  and  digrams  represented  by  the  sample . 

The  narrowness  of  the  problem  is  emphasized.  The  sample  was  not  contin- 
uous language  and  it  might  be  regarded  es  zero-order  approximation  to  the 
language  of  speech,  i.e.,  an  array  of  words  not  weighted  by  their  prob- 
ability. In  contrast,  the  studies  cited  above,  e.g.  6,  11- 14,  18-25,  in- 
volved first-order  approximations  to  language,  words  weighted  by  their 
frequency  of  occurrence.  5Eus,  interpretation  and  application  of  the 
present  results  are  to  matters  pertaining  to  single  words,  for 

example,  words  that  i~rr  enu^t raced  as  oral  drills,  singularly 

loaded  with  a phoneme  in  three  positions:  initial,  final,  and  "medial". 

Should  the  drill-maker  wish  to  employ  probable  and  improbable  phonetic 
environments  for  the  phoneme,  based  on  an  unweighted  population  of  "root" 
words,  the  present  material  would  be  relevant. 


PROCEDURE 

The  sample  of  words  included  3.549  of  one  syllable  and  2151  of  two 
syllables.  They  had  been  selected  from  ^ords  of  Thorndike  ratings  1-10 
through  excluding  homonyms,  homographs,  and  words  of  greatest  and  least  in- 
telligibility (included:  > 20%  < 80%  when  heard  through  headsets  in  high- 

level  noise  recorded  as  write-down  items ) = The  sample  v«s  Rvailable  with 
the  phonetic  spellings  entered  on  IBM  cards  (2).  The  representativeness  of 
the  selected  sample  with  reference  to  the  Thorndike  list  was  checked  in  some 
particulars.  For  example,  the  proportion  of  initial  letters  had  not  been 
altered  beyond  "chance".  However,  words  of  two  sounds  among  one-syllable 
words  and  words  of  seven  or  more  sounds  among  two-syllable  words  were  dis- 
proportionately infrequent.  The  principal  analyses  were  applied  to  five 
categories  of  words,  designated  by  an  asterisk  in  the  following  summary  of 
the  sample: 


Soundr 

1 .‘Syllable 

2 Syllable 

3 

91 

— 

3 

679* 

47 

628* 

528* 

5 

151 

765* 

6 

— 

555* 

6 

--- 

256 

3 


A major  limitation  of  the  sample  would  seem  to  lie  in  the  fact  that  the 
words  were  only  root  forms,  present  tense,  etc*;  r.  c p Itrsxs , etc « 

Three  students  of  speech  transcribed  the  words  phonetically j two  had 
taught  phonetics.  All  used  General  America®  pronunciation.  The  work  of 
each  transcriber  was  reviewed  by  the  remaining  two.  The  49  sounds  that  were 
used  are  indicated  in  Table  1.*  They  included  four  syllabic  sounds,  [ll, 

[ r J , {m} , and  [nj;  these  reduced  the  shwa's  of  the  alternative  transcriptions 
[jlj , *[arj , [am] or  [an] . Subsequently,  because  of  Biir.11  numbers  of  entries 
in  some  cells  the  populations  of  some  related  sounds  were  pooled:  all  sound.3 

of  an  r character,  Irj  , [a-\,  Tf]  J [ex]  and  [el  j [c*^end  [o]  j Lll  and  [ll  , 

[nOand[mi,  and  [njandQjf  ",  reducing  the  categories  to  hi.  The  IBM  cards ’were 
dichotomized  by  syllabise,  sub-sorted  according  to  number  of  sounds,  sorted 
within  each  category  for  the  presence  of  each  sound,  and  re-sorted  for  each 
sound  adjacent  to  each  other  sound.  " 

The  procedures  of  information  theory  assume  that  a sample  is  repre- 
sentative ef  an  infinite  population.  This  assumption  is  rarely  fulfilled  in 
studies  of  language.  For  example,  in  a study  of  vocabulary  involving  nearly 
a third  of  a million  words  uttered  by  college  students  in  classroom  speeches, 
one-third  of  the  6000  different  words  were  used  only  once  (l). 

Iu  a b ample  of  continuous  speech  each  sound  would  be  both  an  1-sound 
and  a .1- sound  in  successive  digrams.  However,  the  transcription  o?  running 
speech  would  either  include  a space  at  the  end  of  a word  or  reserve  the 
space  for  arbitrarily  defined  pauses.  The  space  might  be  treated  a*  a symbol 
in  which  case  each  sound  would  be,  as  stated  above,  bath  1 and  ^ at  different 
times. 3 "his  condition  would  obtain  if  the  discreteness  of  each  word  were 
preserved.  For  example,  in  a three-scund  word  only  the  first  two  Bounds  ecus 
be  i*s  and  the  last  two  j_'s  and  only  one  sound  can  be  both  1 and  J_  in  treat- 
ments of  the  digrams.  This  might  be  called  an  end  problem  and  it  occurs  with 
each  word  in  the  population.  In  the  word  top,  fta]  and[ap}  are  (i^)  digrams 
but  there  ia  no  opportunity  for[t]  to  be  ^ nor  for [p]  to  be  i^  if  the  word  is 
treated  singly.  Thus,  in  determining  the  average  information  of  digrams,  the 
number  of  sounds  from  which  the  (i_, j,)  digrams  were  derived  in  each  category 
of  words  was  number  cf  phonemes  in  the  words  minus  the  number  of  words  of  the 
category.  Shannon  retained  the  "identity  of  the  word,  in  ills  vorE,  notlxig7  "A 
word  is  a cohesive  group  of  letters  with  strong  internal  statistical 
influences. .. ."(15).  A consequence  of  this  treatment  is  the  possibility  that 
inforafition  value 3 of  i_' s and  J's  will  differ"  if  sound-  occur  with  differing 
frequcuuiea  in.  initial  and  final  positions. 


* The  phonetic  alphabet  appears  with  illustrations  in  the  Appendix. 

During  the  writing  of  this  paper  the  author  has  received  from  John  B. 
Carroll  (j)  & progress  report  on  a study  of  probabilities  of  English 
phonemes.  He  treats  a sample  of  continuous  language  with  a space  or  27th 
letter  occurring  between  successive  wordy. 
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(l)  ptmog 


A model  for  arranging  the  raw  scores  for  the  cccnputational  procedures 
relative  to  each  category  of  words  follows: 

Sound  (j_)  following  sound  (_i) 


words: 


Pj(l)  = 

nU 

CD 

flj 

p,(j)  = 

(2) 

pM  , s 

r % - . o / 

p(i)Pi(j) 

(3) 

_/j  \ 


j 

_>i 

Ziv  Oj, 

i J J 


n.  \ 

/ 


RESULTS  AND  DISCUSSION 


Two  objectives  were  stated  above:  (a)  to  estimate  the  information  1 xi 

the  sound  and  digrams  of  words,  and  (b)  to  enumerate  certain  phonetic  prob- 
abilities. 

A method  of  estimating  average  inforw-ti  «n  suggested  by  Shannon  (15,16) 
nmy  be  applied  progressively  through  (a)  "a  circumstance  of  ao  inters ymbol 
influence",  (b)  a condition  that  represents  the  frequency  of  each  symbol, 
(c)...cf  each  digram,  (d)  trigram,  etc.  In  his  notation,  Fq,  F>,  and  Fg 
represent  successive  estimate.’  of  H,  each  succeeding  one  determined  frcm  a 
more  complete  account  of  the  statistics  of  the  language.  Thus,  in  the 
present  application,  Fp(i  F^;  and  Fo  appraise  respectively  (a)  equal  prob- 
ability of  all  sounds,  (hT  observel  probability  of  all  sounds  . cluu  \\i  / -Ji >•• 
served  probability  of  all  sounds  in  digrams.  The  relevant  formula8  for 
determining  the  average  information  per  symbol  follow,  with  illustrative 
computations  to  indicate  that  with  no  intersymbol  influence  and  with  equally 
probable  sounds,  the  solutions  for  b and  c would  yield  the  same  value  aa  a. 


Fq  - logpN  or 

(5) 

s 'lego  _1  r iC  bits  per  sound 

- 41 

s - £ 2<i)  loS£  eC*.)01* 

(6) 

b 4l(i130o)  = 5.35  hits  par  sound 

*e  = - E £(14)  lo®2  £i(D 
14 

(7) 

= - E £(i4)  lo«2  fi(i4)  / L £(i) 

i°sg  e(i) 

14  1 

= 4l2(o.on637)  -4l(o.l306) 

- 10.70  - 5.35  = 5.35  hits  per  sound 

The  foregoing  computations  illustrate  the  application  of  three  of 
Shannon's  formulas  for  the  calculation  of  average  information  per  symbol. 

The  maximum  average  information  per  digram  would  be  double  the  value  per 
sound,  2(5.35)  = ID. 70  bits.  However,  to  maintain  c basis  for  easy  compari- 
son the  present  results  are  stated  as  average  information  per  sound. 


To  the  extent  that  the  sounds  of  the  present  samples  do  not  occur  equally, 
the  average  information  per  symbol  is  attenuated  from  5*35  bits.  Formula  6 
(above)  was  applied  to  both  the  _i  and  sounds  of  the  digrams  of  the  words  of 


6 


each  category.  • First,  the  formula  was  applied  as  written,  and  than  was 
altered  through  substituting  j.  for  jL.  The  average  information  per  sound  in 
bits  and  the  average  redundancy  follow ; 


H or  F^  (bits)  and  Redundancy  (r) 


one- syllable  words 

two-syllable  words 

i- sounds 

^.-sounds 

i- sounds 

j-sounds 

bits  R 

bits  R 

bits  R 

bits  R 

3-sovnd 

5.04 

.06 

4.65 

.13 

4-aound 

4.15 

.22 

4.4o 

.17 

4.68 

.13 

4.31 

.19 

5-sound 

4.48 

.16 

4.15 

.22 

6-sound 

4.4o 

.13 

4oi 

• u 

« -y 

Shannon  explains  that  the  information  of  a digram  is  either  equal  to  or 
less  than  the  sum  of  the  information  of  each  of  the  symbols  of  the  digram. 
Equality  can  obtain  only  in  the  event  of  no  intersymbol  influence  as  in  the 
illustrative  computation  in  formulas  6 and  J_  (above).  Thus,  the  average  in- 
formation per  symbol  in  the  digrams  of  3-sound  words  would  be  expected  to  be 
less  than  5.04  / 4.6$,  or  4.85  bits.  Cocrputationn  of  the  information  of  the 
P 

digrams  yielded  the  following  average  information  per  symbol  and  the  indicated 
values  r,r  redundancy: 

Fo  as  an  estimate  of  H (bits)  find  redundancy  (R) 


one-syllable  words  two-syllable  words 


bit3 

R 

bits 

R 

3-sound 

4.21 

.21 

— 

4-sound 

3.35 

»37 

3. 39 

.27 

5-sound 

— 

3.89 

• 27 

6-SO>jnd 

- — 

3.75 

.30 

An  earlier  discussion  explained  that  the  numbers  of  each  sound  treated  as 
"c  i gram- sounds"  wer  ; attenuated  because  of  the  end  problem.  However,  for  the 
sake  of  comparison,  all  of  the  sounds  of  all  categories  of  words  were  pooled 
and  the  average  information  per  sound  determined.  This  yielded  4.46  bits  per 
sound. 
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First,  the  trend,  is  obvious,  both  in  the  calculations  of  the  average  in- 
formation per  sound  of  the  digrams  and  of  the  digrams  themselves,  that  the 
symbols  of  the  "shorter"  words  convey  more  information  than,  do  those  of  the 
"longer"  words. 

Second,  in  four  of  the  five  instances  the  average  information  per  sound 
wan  greater  when  computed  on  the  basis  of  going  from  "sound  ± to  the  follow- 
ing sound  J"  than  vice  versa.  In  the  "on-going"  circumstance^  the  final 
sounds  of  the  words  were  by  definition  ^'s  and  never  i'et  in  the  "backward 
looking"  instance  the  initial  sounds  were  only  i_'s  ara  never  j_’s.  Thus,  one 
might  conclude  that  the  initial  sounds  contain  more  information  than  the 
final  sounds  of  words  and  also  that  a "preceding"  sound  conveys  more  infor- 
mation on  the  average  than  a "following"  sound.  This  backward  look  at  a 
supposedly  on-going  phenomenon  is  somewhat  irregular.  It  is  remindful  of 
the  changes  that  are  introduced  in  the  identification  of  the  phonetic 
character  of  a preceding  sound  in  synthesized  speech  by  the  modification  of 
a subsequent  eound  (10), 

Third,  the  phonetic  structure  of  language  is  such  that  some  sounds  tend 
to  be  adjacent  more  frequently  than  others.  Hence,  the  average  information 
per  sound  in  digr&uis  is  Less  than  the  average  information  per  Bound  when  the 
sound  is  treated  as  an  isolated  unit  (although  in  a position  to  be  either 
member  of  a digram).  The  decrements  In  Information  from  (a)  maximum  or  Fq, 
to  (b)  observed  average  bits  per  symbol  (i  / j.)  or  F^,  to  (c)  observed  bits 

3 

psr  "digram  symbol"  or  £2  are  in  the  various  instances  j 5 


(*) 

(b) 

(c) 

3-sound, 

1 syllable 

5.35 

4.85 

4.21 

4— sound. 

1 syllable 

5,3*; 

4.28 

3*35 

4-sound, 

2 syllable 

5 = 35 

4.50 

3*89 

5 -sound. 

2 syllable. 

5*35 

4.32 

3*89 

6- sound. 

2 syllable 

5*35 

L*37 

3=75 

J *1  <9  /winawi  1 v>  *4*VtA  01^  1 aoo  r*-i  onAir  o lAnmumA 

A A V .bM  V AAA  W4AW  b/*  AVMM  WM  WAA  AWAWItvj  VA  «A  .Iwi  AA 

through  redundancy.  The  present  values  relate  to  an  assumption  of  maximum 
utilization  of  a system  of  in  aymbol**-  The  second  and  third  values  of  Row 
one  could  be  achieved  with  29  and  19  symbols*  Row  two:  20  and  10  symbols: 

Row  three : 23  and  15  symbols;  Row  four:  20  and  15  symbols;  aai  Row  five: 

21  and  14  symbols,  This  ei -rfim stance  would,  of  course,  presume  e qu i -p rob ob  1c 
use  of  the  symbolB.  The  listener  would  have  no  clue  within  the  word  about 
what  sound.  was  cooing  ne:**t  and  the  vocabulary  vjuld  include  all  permutations 
of,  for  example,  [dal]  including  fa3d]  (.lad]  (Ida]  (dlaj  [dcil]  £adl3.  Ob- 
viously, as  redundancy  in  reduced,  the  requirement  for  accuracy  in  symbol-by- 
symhoi  reception  grows  larger. 


O 


U 


On  the  basis  'vf  the  above  numerical  values,  the  present  phonetic  code 
is  being  employed  with  relative  efficiency  in  monosylLablet:  of  three  sounds. 
Possibly  the  generalization  is  warranted  th/it  the  efficiency  with  which  the 
phonetic  structure  of  English  operates  decreases  within  words  of  a particular 
syllabic  length  as  the  number  of  sounds  increases. 

The  second  set  of  results  of  this  study  applies  a "frequency-of- 
oceurrence"  tabulation  of  sounds  to  the  "mono-frequency"  sample  of  words. 

For  example.  Table  1 sunroarizes  an  enumeration  of  the  relative  frequency  of 
the  4l  sounds  in  the  population  of  words  with  initial,  "medial",  and  final 
sounds  treated  sepax'ately.  The  table  indicates  the  probability  of  each  of 
the  sounds  among  a population  of  initial  sounds,  a population  of  "medial" 
sounds,  and  of  final  sounds  in  a non-repeating,  dictionary- like  group  of 
one-  and  two-syllable  words.  In  this  instance  there  is  no  "end  problem”  and 
all  of  the  phonemes  of  the  words  are  represented. 

One  Taethod  of  tabulation  might  show  the  proportions  with  which  each 
phoneme  succeeds  each  other  phoneme,  these  being  so  arranged  that  the  rows, 
for  example,  would  represent  the  i-sounds,  and  the  columns,  the  j-sounds. 
Frequency  or  proportion  of  Joint  occurrence  would  be  indicated  and  this 
"cell"  value  would  state  the  probability  that  the  sound  ^ follows  the  sound 
i in  one-  and  two-syllable  words  of  various  numbers  of  sounds.  The  stune 
tables  ccnild  be  read  vertically  through  the  columns  to  find  pi (i),  or  the 
probability  that  an  i-sound  precedes  an  observed  J- sound.  Such  tables, 
though  available,  tend  to  become  excessive  in  size.  Accordingly,  the  more 
frequent  combinations  have  bean  extracted  and  appear  in  Tables  2 and  3. 

These  two  tables  are  not  to  be  interpreted  as  listing  the  most  frequent  pairs 
of  sounds  in  the  words.  Thie  enumeration  appears  in  Table  4.  The  entries 
in  Table  2 relate  to  transitional  probability:  if  i-sound  occurs,  then  the 

chances  are  at  least  one  in  10  that  J-sound  will  occur.  The  i-sounds  stand 
before  the  colon.  Table  3 is  similar  to  Table  2 except  that  the  transitional 
probabilities  are  based  on  However,  the  i-scunds  again  are  before  the 

colon  as  in  Table  2.  ~ 

Table  4,  as  described  above,  lists  the  most  frequent  digrams  in  terms  of 
joint  probability.  The  table  enumerates  the  digrams  that  have  at  least  a 
probability  of  .003  each. 

A feature  of  digram  probability  that  is  revealed  by  treating  words  of 
various  lengths  separately  is  that  different  traneitiona!  probabilities  occur 
with  the  same  digrams  in  the  different  categories  of  words.  The  nine  isolated 
examples  that  follow  are  selected  from  the  4l  x 4l  matrix  to  illustrate  this 
point.  In  these  nine  examples,  the  sounds  appear  over  each  example  and  the 
i-sounds  at  the  left  side. 
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1- syllable  2- syllable  1- syllable  2- syllable  l-sy liable  2-syllable 


Example 

1 

Example 

2 

Example 

3 

Da 

Cx3 

P3 

3-sound 

0.7 

3.7 

4.5 

2.2 

4-sound 

m 4,9 

1.9 

9.3 

39.4 

0.6 

5-sound 

11.6 

PJ 

17.5 

r~* 

CQ 

IZj. 

2.1 

6-sound. 

17.8 

15.8 

6.1 

Example 

*r 

Example 

5 

Example 

6 

, , w 

C«3 

3-sound 

• 

6.6 

5.2 

4-scand 

13.5 

0.6 

6.6 

8.9 

11.5 

1.9 

5-sound 

W 

2.1 

PI 

14.7 

in 

2.1 

6-sound 

6.1 

17.8 

4.8 

Example 

7 

Example 

8 

Example 

9 

W 

IU 

ill 

0 « ^?uncl 

■j  J 

• 1 

5*2 

2.2 

4-sound 

- 

12.1 

7*1 

18.1 

2.7 

8.0 

5-sound 

C*3 

13-C 

[rl 

22.3 

10.3 

0— sound 

12.9 

21.7 

8.3 

The  five  values  of  Example  T_;  tnr  J-  -.p-hance,  indicate  that  in  three- 
sound  words,  the  particular  j followed  the  particular  i- sound  with 

only  one- seventh  the  probability  that  the  same  j- sound  followed  the  same 
i-=cund  in  four-sound,  one -syllable  words.  This  variability  1b  even  greater 
in  two-syllable  words.  If  the  two-syllable  word  of  this  example  contains 
six  sounds,  the  probability  is  nine  times  as  great  as  if  the  word  has  only 
four  sounds  that  when  i.  occurs,  j_  will  follow. 


EXEMPLI?  IH  AT  IGK  AMD  3UEMARY 


From  a non-repeating  population  of  root  words  a listener  might  hear  a 
sample  of  five  words,  one  of  each  of  the  five  lengths  that  have  been  treated 
here.  There  is  a biased  probability  with  respect  to  the  acoustic  events  in 
pnch  word ; The  word*  right  begin  with  the  five  moot  frequent  initial  soubSs* 


i.wj  , [kjj,  [p] , and  fr] . The  most  probable  sets  of  acoustic  events  in 

the  fijee  words,  together  with  their  probabilities  are:  [sut]  , , 0.152, 

0*156;-  fkrenj, G.2?6,  0,103  , 0.280;  fplebrl  , - 0.197,  0.292,  0.157 

0.157  (on  the  basis  of  one -syllable  probabilities'* ubltxj,  — , o.24o,  0.I.3! 
0.195  (on  the  basis  of  two-syllable  probabilities' : frisxtn] , — , 0.219, 


By  way  of  further  expl nuatiou , iu  tluree-'scnrnd  words,  [ ej having  occurred, 
£ulhas  a probability  of  0.152  of  occurring  tvs  the  next  sound;  and  £u]  having 
occurred,  [t]  is  the  most  urobable  succeeding  sound  with  a probability  of 
0.156. 
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0.173,  0.335,  0.345*  0.219.  Similar  procedures  originati with  the  five 
most  frequent  terminal  sounds  yield  the  following  results  (read  the  prob- 
abilities from  right  to  left):  k<rt  , 0.113,  0.118,  — ;7£a r*n],  0.118, 

0.188  , 0.135, J fbrtart] , 0.125,  0.146,  0.125,  0.146,  — ; [rink],  O.I69, 

0.178  , 0.227, ; fristrij,  0.220  , 0.273,  0-24-2,  0.185,  0.220,  — — » The 

examples  indicate  that  in  a sequence  of  English  phonemes  there  are  transi- 
tional probabilities  of  an  order  to  indicate  at  least  one  cheince  in  10  tliat 
a particular  sound  will  be  next  and  that  these  chances  may  exceed  one-  in 
three  in  sees  sequences. 

In  sunsnary,  the  phonetic  elements  of  English  in  root  forms  of  words 
have  dissimilar  frequencies  in  the  language,  both  in  isolation  and  as 
digrams.  These  frequencies  are  not  independent  of  the-  preceding  and 
succeeding  sounds.  When  the  adjacent  sounds  arc  treated  as  pairs,  the 
average  redundancy  is  at-  least  .20,  and  within  the  categories  of  words 
sampled  reached  .37*  The  sounds  of  words  of  three  phonemes  contain  more 
average  information  per  sound  than  do  the  sounds  of  longer  words. 
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1 In  this  instance,  [tj  is  the  selected  final  sound.  In  three-sound  words 
the  most  likely  sound  to  precede  [tj  is  Ca?j  ard  the  probability  is  0.118; 
in  turn,  the  sound  most  likely  vo  precede  [_®I  is  Ik]  , 0.113. 
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Table  1.  The  proportions  of  initial,  ‘’medial",  and  final  positions  of  words 
of  different  lengths  that  contain  the  indicated  sound.  The  proportions 
awnear  separately  for  1-  and  2-syllable  words*  Fvaad  as  chances  in  100. 


initial  medial  final 

1-syllable  £- syllable  1-syllable  2-syllable  i-syllable  2-syllaole 


» 

JL 

0.20 

0.37 

4.10 

2.32 

0.60 

0.09 

i 

0.50 

4 «•  80 

7.50 

ic. 71 

9.16 

e-er. 

0.30 

0.37 

4.60 

2.53 

1.30 

1.92 

£ 

0.50 

1.02 

5.30 

3.93 

W 

o.5o 

1.07 

7.20 

3.97 

r\  10 

V • j-vJ 

a 

0.50 

0.84 

4.50 

2.79 

0.10 

0.09 

0 

0.10 

0.79 

3*90 

i.'5 

0.50 

0.05 

o-ov 

o.4o 

0.88 

4.10 

2.08 

0.80 

1.59 

3 

3.96 

5.17 

0.51 

A 
• » 

0.10 

0.37 

5.50 

2.25 

Wrr 

4.70 

6.56 

15.30 

10.06 

3.60 

16.87 

u 

0.10 

3.20 

1.4#; 

0.80 

0.42 

v 

0.30 

0.17 

0.10 

0.14 

0.50 

0.20 

0.30 

a u 

0.10 

0.51 

1.90 

C.6l 

0.30 

0.05 

al 

0.10 

0.28 

3.90 

2,28 

1.20 

O.56 

Ju 

0.10 

0.14 

0.30 

0.46 

0 . TO 

0.23 

P 

6.8c- 

8»lo 

2.4o 

2.69 

tr 

0.79 

u 

8.6o 

6,98 

1.78 

2.00 

0.61 

4. 

u 

6.6o 

4.75 

2.60 

5.75 

17.20 

34.49 

d 

5.00 

6.05 

0.20 

2.86 

3,8o 

6.82 

k 

9.10 

8.52 

2.10 

3.33 

8.70 

2.^*8 

6 

6.20 

2.93 

0.10 

1.17 

2.20 

0.19 

f 

*7  nr\ 

i t v/». 

5.87 

0.20 

1.75 

2.60 

0.70 

V 

1.4o 

1.77 

0.20 

1.44 

2.70 

1.03 

6 

1.80 

0.79 

r\  •>« 

2.60 

0.56 

% 

0.50 

0.33 

O.  V? 

0.30 

0.05 

s 

16.90 

30.57 

2.80 

k.H6 

6.20 

8,27 

Cm 

0.10 

0.19 

o.lo 

0.94 

3.50 

2.06 

$ 

3.00 

0.74 

A HA 
(¥ 

1.90 

1.17 

3 

0.05 

o„09 

h 

4.70 

4.05 

KJmG  { 

t$ 

1.70 

0.38 

0.43 

3.90 

0.33 

d3 

1.30 

0.88 

0*64 

P.rA 

1.96 

m-m 

3.60 

5.35 

1.30 

3.0  C 

4^80 

2.34 

n-n 

2,4o 

i m 

J 1 

5.30 

7.12 

6.50 

10.93 

1-1 

• 

4.90 

3.91 

7.20 

5.86 

6.20 

11.59 

w 

3.80 

2.89 

1.90 

0.92 

hw 

0.80 

0.42 

0.07 

i 

0.60 

2.00 

0,23 

9 

1.20 

0.66 

1.90 

2.01 

'/.'able  2;  An  emnaerjition  of  Instances,  based  on  an  alphabet  cf  Ul  sounds,  in  which  exceeds  one  in  ID;  an  asterisk  indicates 

that  £i(j.)  exceeds  «ne  in  five;  and  an  unierlined  entry,  that  it  exceeds  one  in  fcuiT  The  i-30unds  pieced e the  -olon.  The 
following  sounds  are  pooled:  \o,oy\  , g,eij  , [r,r,r,yj  , [m,a],  [n,n],  and [1,1]. 


gj 


i • 

i Q 

Sr 


* 

« v^  f 


< 

ft 

O 


°J§ 

* at  ei*J. 

j ft  • V4  ft  ® 

£<*  *3  * * * • W 

'•*  > ft  ► 

tft  9 • ft  ft  ft  ft  ?/ 

*«*  ‘ft  • JW  *H** 

ftf.+i  ft  « —3  fld 

*»*  **>-.  • ■ — > 3 ,»*_  <•  o « 

^ " ' fl  .*  * Bl .*  - - MS*  ' JO 

^ M u .«  i<i*  %-<  •r)  « u nt  *■ 

• 4<  4 ft  ft*  •»  • •>  «OF  ft  - , ft  *•  tm  _ 

yo  O'*  P»^5  <5  /?!  fl  C>«  <•  o M*l-P  ^ ^ 


K 


A«\ 

<r>  ft  9 % 

% _ ..  „ _ _ ft*  £•* 

-•'.rV.:.  .r"  * 


< 

ft 

*1 


ft  9 

K» 


JU. 

i'  •»  • 

Mbs* 

•?">  ft  ft 

S&K 
*_'*!“  * 
“MS* 


Si  3 ij&Mi 


Oi.O  P <?  M ti»MO 


• • 4*  «•  M M 

a N K>  ■ a 


**  *4  ••  M 


iuV  o n>  < j» 


M • • • • H 


0.0  +>TtaJ4M4l>«^BS  rl  )<  ir» 


a g sl 

m 

w| 


■V  *4 


• iH  •*  ft  0 
* P|P  N ft 

K1*  "*.  **  ft  rO 

£ >h3  P,  J. 

P,  - -S  *4  • M 

P ^»S»  * 

N K*  ”•  n “*  P 

ft;*©  a « 

M <o!>  '*jo  .• 

auvt  ft  ft%*  O •'•• 
•*  - “ — 
<-»* 


tf 

O 

MV 

* 9 »ft 

•*  -.- 
4l  f»l 

* n tti 


P 

• ft 

< P 

ft  ft 

m «► 

*?wr 

-C_«  . 

• - M » 


c.a 


' 3 rt  “*  ir 


o| 

•4 


OU  vi  ft  ■t'%4  r~*  •> •<  o x n ii  • • m • ^ w *.  Sit?  f 1 * 

« * * « v*  «-»  Pint*  » * n o.3  a - •>.  w • o -.o  »*.  o 

+»^>  +5V-..T-1*!  /3  H-P  T)  M><P<  &>Mf»«T|«»<t,  JJ;  ^|<  9 S S t( 


•-*  C’  w 5^  ^ C4>  w w ^ N 
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Table  4.  An  enumeration  of  instances  in  which  digram  probabilities,  p(i,i)j 
of  one-  and  two-syllable  words  of  all  lengths  pooled  exceed  .003.  An*"  ”* 
asterisk  indicates  that  the  probability  exceeds  .010*  The  1-sounds  of  the 
digram  precede  the  colon,  “ 


A.  One-syllable  words. 
K digrams,  3973* 


B.  Two-syllable  words. 
N digrams,  9024. 


1 i 
e: 
£ : 
*: 
a: 
0? 
o: 
A : 
av? 
al: 

0* 

p: 

b: 

t: 

it 

k: 

f: 

0: 

s: 

a! 


v 

r 


r,n,rj  ,p,t,s. 
n*,k,t. 


u,l. 
n,s,f 
r#,t,k. 
l,r. 
l,r. 

n,m,a,rj. 


st.\. 


t,d,m. 

k. 


r,i^,I,£. 

r,l,C. 

r*. 

r. 


l,»Ao. 

l,r. 

r.l.t- 

t*,p*,k»,l,w. 

P» 

4*  4 • 4>t 

I,*A. 

>A>'v»'*»£»0,t,5,fl,d,s,(ijr  ,at . 


it 
It 
£ t 

ae: 

a: 

os 

ot 

d: 

At 

ps 

bt 

t: 

d; 

k: 

sr? 

W3  * 

ft 

v: 

s: 

5* 

E! 

n; 
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w: 


s. 

t #k',d,v,l,d3,p,f ,^,s  ,n,m. 

n,s,r,t,k,d,l. 

n,k,l. 

r,n. 

r. 

r,l« 

n,l,s,s. 

n. 

v, 1,4  fit, 

l.r.z. 

r*A,n, 

lV,l. 

r,o,l,t,z,e,a, 

r. 

r,s. 

^I>k,p,l. 

a. 

d,Z,tir. 
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APPENDIX 


® copied  from  Van  Riper,  C.  0.  and  D.  E.  Smith,  An  Introduction  to  General 
American  Phonetics,  New  York:  Harper  & Brothers,  l555T“ 


CONSONANTS 


Phonetic 

Phone  ti: 

■IfQAw'.  — a* 

?!:  netics 

.OyrnhAl 

£* 

beg,  tub 

[beg  tAb 

IP 

d 

<!b,  and 

du  &BD& 

r 

f 

7sa,  scarf 

ft m ekorf 

8 

£ 

grow,  bug 

gro  b?g 

t 

d5 

judge,  enjoy 
hail,  inhale 

d3Ad3  fndjai 

l 

”V 

h 

hem  inhel 

k 

1 

kick,  uncle 
let,  pal 

kjk  Ajkl 

lit  p*(L 

0 

% 

1 

apple,  turtle 

aepl  tytl 

V 

m 

men,  arm 

men  <tru 

w 

m 

autism,  wisdom 

jty  wxzda 

hw 

n 

nose, 'gain 

no*  gen 

A 

n 

sudden,  curtain 

SAdn  kjrtn 

$1 

wrong,  anger 

rop  \<]E2Tl 

zj 

V0tfKI£ 


Pwcrl  |gh 


uBE^er 

run,  far 
send,  u'i 
Toe,  anT 
shed,  ash 
cKeap.  each 
Thin,  tootS 
then,  breathe 
vow,  have 
wet.  twin 
when,  white 
£Ou, 

pleasure,  vision 
boo,  ooze 


Phonetic g 

[tn^py  darama' 
ran  for 
send  as 
to  «nt 
$£d  «f* 
tjip  its 
Orn  tv& 

0£n  bri5 
vatr  harv 
wet  twxn 
hwen  hwatt. 

Ju  Jet 

plijcrr  vxxa  r 
ru  us 


r>i\ 

ask,  rather 

Task  nab** 

a. 

fathei’7  odd 

ia'.tir  od 

e 

make,  eight 

nek  et 

sp 

sat,  act 

fcMrt  sffct 

i 

ratline,  east 

fa tig  1st 

£ 

red 7 end 

red  end 

1 

it,  since 

it  sins 

ol 

nope, “old 

nop  old] 

DIPH^ONOS 


[9 

sauce,  off 

fsos  Of 

3* 

earn,  fur 

an  far 

S* 

never,  percale 

nevar  p&kel 

u 

truth,  bine 

trufl  blu 

V 

put,  nook 

pot  n*k 

A 

Tpder,~Iove 

Ante*  Iav 

*] 

about,  second 

chart  sgkandj 

Cal  clgb,  aisle  sat  ail 

al/]  new,  owl  naif  avl 


0>iJ 


coy 


oil 


[k.ll  Oil] 


ir.ftfi  VnrumAn  1 AfiknifiQi  S 
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Because  of  our  limit 
YOUR  PURPOSE  sc  u* 
will  be  appreciated. 


upply,  you  are  requested  to  return  this  c.ipy  WEEN  IT  HAS  SER’ 
it  may  be  made  available  to  other  r Your  eoopwat :oi 


NOTICE:  WHEN  GOVERNMENT  OR  OTHER  DRA**'  ^HSS,  Si! 
ARE  USED  FOR  ANY  PURPOSE  OTHER  THAN  IN  NECT 
GOVERNMENT  PROCUREMENT  OPERATION.  THE  U,  S.  GO 
NO  RESPONSIBILITY,  NOR  ANY  OBLIGATION  WHATSOEVE 
GOVERNMENT  MAY  HAVE  FORMULATED.  FURNISHED,  Olf 
SAID  DRAWINGS.  SPECIFICATIONS.  OR  OTHER  DATA  IS  NC 
! IMPLICATION  OR  OTHERWISE  AS  EN  ANY”  MANNER  UCZNi 
FERSON  OR  CORPORATION,  OR  CONVEYING  ANY  RIGHTS 
USE  OR  SELL  ANY  PATENTED  INVENTION  THAT  MAY  iN 


SC.T.FICAT1GNS  OR  OT'fiEV.  C 
CON  WITH  A DE  FSNSTPJLY  F' 
:VL».UMSNT  THEREBY  l,\C\ 
% THE  FACT  THAT  T 
•i  IN  ANY  WAY  SUP  P'5  ::ld  1 ' 
:i n to  m regarded  m 

[ENG  THE  EOIXiiE  OH  /cry 
OIU^ERMISSiON  TO  Ufa, Vi# 
ANY  WAY  BE  RliJuAl  LYi  TH! 


